EvoClass
AI012
Deep Dive into Large Language Models
Autonomous Agents, RLHF, and Safety Alignment
Learning Objectives
- Analyze the architectural components of GUI agents, including planning, decision-making, and reflection modules in multi-agent systems.
- Explain the mechanics of Reinforcement Learning (RL) and RLHF, specifically the role of reward models and PPO in aligning agent behavior with human values.
- Evaluate safety risks and reliability issues in autonomous agents, including Out-of-Distribution (OOD) errors, jailbreak attacks, and environmental distractions.